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The aetiologic agent of the recent epidemics of Severe Acute Received 12 July 2003 

Respiratory Syndrome (SARS) is a positive-stranded RNA Accepted 30 July 2003 

virus (SARS-CoV) belonging to the Coronaviridae family and 

its genome differs substantially from those of other known 

coronaviruses. SARS-CoV is transmissible mainly by the 

respiratory route and to date there is no vaccine and no 

prophylactic or therapeutic treatments against this agent. A 

SARS-CoV whole-genome approach has been developed 

aimed at determining the crystal structure of all of its proteins 

or domains. These studies are expected to greatly facilitate 

drug design. The genomes of coronaviruses are between 27 

and 31.5 kbp in length, the largest of the known RNA viruses, 

and encode 20-30 mature proteins. The functions of many of 

these polypeptides, including the Nsp9-Nspl0 replicase- 

cleavage products, are still unknown. Here, the cloning, 

Escherichia coli expression, purification and crystallization of 
the SARS-CoV Nsp9 protein, the first SARS-CoV protein to 
be crystallized, are reported. Nsp9 crystals diffract to 2.8 A 
resolution and belong to space group P6 1/5 22, with unit-cell 
parameters a = b = 89.7, c = 136.7 A. With two molecules 
in the asymmetric unit, the solvent content is 60% 

(V M = 3.1 A 3 Da- 1 ). 


1. Introduction 

The recent epidemics of Severe Acute Respiratory Syndrome 
(SARS) represent a real paradigm for emerging viral patho¬ 
gens, as well as an example of worldwide coordinated efforts 
to control a serious viral outbreak, a test of the reaction time 
of the scientific community. The first cases of Severe Acute 
Respiratory Syndrome originated from the Guangdong 
province in South East China. The number of cases reported 
and our current knowledge regarding this illness are still 
currently evolving, but a number of basic facts have been 
firmly established. The aetiologic agent of SARS is a positive- 
stranded RNA virus belonging to the Coronaviridae family 
and its genome differs substantially from those of previously 
identified coronaviruses, including two other human corona¬ 
viruses (Peiris et al., 2003; Ksiazek et al., 2003; Drosten et al., 
2003; Snijder et al, 2003). The virus, whose name SARS-CoV 
is now currently accepted, is mainly transmitted by the 
respiratory route. However, evidence for a secondary faeco- 
oral route of transmission has also been presented. The viral 
strain probably primarily infected wild animals traded in 
Asian markets and crossed the species barrier to infect 
humans. 

There is to date no vaccine and no prophylactic or ther¬ 
apeutic treatments against this agent. A prophylactic 
treatment would have been useful to combat the epidemics; 
the only effective measure available to prevent the spread of 
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the virus is to quarantine all persons that have been exposed 
to SARS-CoV. The number of antiviral molecules that can be 
used to treat patients infected by RNA viruses is incredibly 
low. Accordingly, it is important to search for efficient antiviral 
drugs for a large number of RNA viruses, while giving priority 
to viruses transmitted by the respiratory route because they 
have the highest potential for causing pandemic outbreaks. 

The scientific community has reacted promptly and effi¬ 
ciently to identify and characterize this new infectious agent, 
as well as to develop methods for SARS-CoV detection and 
containment protocols. In the meantime, a wide effort is being 
made to design drugs active against SARS-CoV. Ribavirin has 
been used in the absence of other candidates, but its intrinsic 
efficiency against SARS-CoV appears to be low (Koren et al ., 
2003). 

To select drugs active against a viral pathogen, one usually 
relies on screening candidate drugs for their efficacy in virus- 
infected cell cultures and/or animal models. However, during 
the current research on drugs for treating hepatitis C virus 
(HCV) infections, a novel and promising approach has been 
introduced. The RNA-dependent RNA polymerase of HCV 
has been purified and crystallized and enzymatic tests have 
been used to find potent nucleoside and non-nucleoside 
inhibitors of the virus, the structure-activity relationships of 
which allow further testing and clinical developments (de 
Francesco et al., 2003). This approach is gaining momentum 
owing to a concomitant increase in the power of new tech¬ 
nologies and technological developments. Among those, 
genomics approaches are being conducted to solve the crystal 
structures of large sets of clinically relevant proteins, which 
will become the subjects of future structure-function rela¬ 
tionship studies. 

A crystal structure has not yet been determined for any of 
the 28 predicted mature SARS-CoV proteins. The crystal 
structure of the main (or 3CL) protease of transmissible 
gastroenteritis virus, a related coronavirus, has been deter¬ 
mined and was used to construct a model of the SARS-CoV 
3CL protease, facilitating future drug design against this 
important target (Anand et al., 2003). The putative corona- 
virus RNA-dependent RNA polymerase has been purified, 
but is inactive in vitro (Grotzinger et al., 1996). 

In this context, we have developed a SARS-CoV whole- 
genome approach aimed at determining the crystal structure 
of all SARS-CoV proteins. We anticipate that this will greatly 
facilitate drug design as well as the study of many other 
aspects related to the biology of these complex viruses. 


Coronaviruses are enveloped viruses with a single-stranded 
RNA genome of positive polarity (Lai & Holmes, 2001). Their 
genome is between 27 and 31.5 kbp in length, the largest of the 
known RNA viruses. Like other coronaviruses, the SARS- 
CoV genome is known to encode two large replicase poly¬ 
proteins (the ORFla and ORFlab proteins), which are 
processed into a set of mature non-structural proteins (Nsps) 
by internal viral proteases (Snijder et al ., 2003). The functions 
of many of these products, such as the Nsp9-Nspl0 poly¬ 
peptides produced from the C-terminal domain of the ORFla- 
encoded polyprotein, are still unknown. In the related mouse 
hepatitis virus, which is a group 2 coronavirus, the SARS-CoV 
Nsp9 corresponds to a 12 kDa cleavage product (Pla-12) that 
is found preferentially in the perinuclear region of infected 
cells, where it co-localizes with other components of the viral 
replication complex (Bost et al ., 2000). No clues to the func¬ 
tion of the Nsp9 equivalent of any coronavirus have been 
obtained thus far. Here, we report the cloning, expression, 
purification and crystallization of the SARS-CoV Nsp9 
protein, a 113-residue protein (Fig. 1), which is the first SARS- 
CoV protein to be crystallized. 

2. Material and methods 

2.1. Infection and RNA isolation 

Vero cells were infected with SARS-CoV (Frankfurt-1 
strain; NCBI Accession No. AY291315; Drosten et al., 2003) at 
a multiplicity of infection of 0.01. At the onset of the cyto- 
pathogenic effect (approximately 40 h post-infection), intra¬ 
cellular RNA was isolated by cell lysis for 10 min at room 
temperature with 5% lithium dodecyl sulfate in LET buffer 
(100 mM LiCl, ImM EDTA, 10 mM Tris-HCl pH 7.4) 
containing 20 pg ml ■ of proteinase K. After shearing of the 
cellular DNA using a syringe, lysates were incubated at 315 K 
for 15 min, extracted with phenol (pH 4.0) and chloroform and 
the RNA was ethanol-precipitated. cDNA was obtained by 
reverse transcription using primer SAV009 (5'-GGACAG- 
CAACCGCTGGACAATC-3'), complementary to nucleo¬ 
tides 13644-13665 of the Frankfurt-1 genome, using 
Thermoscript reverse transcriptase (Invitrogen). 

2.2. Subcloning, Escherichia coli protein expression and 
purification 

The SARS-CoV Nsp9-coding sequence was amplified by 
PCR from the cDNA prepared above using two primers 
containing the attB sites of the Gateway 
recombination system (Invitrogen). At 
the 5' end of the gene, a sequence 
encoding a hexahistidine tag was 
attached. The cDNA was then 
subcloned in the pDestl4 plasmid 
(Invitrogen). The open reading frame 
of the final construct (referred to as 
pDestl4/Nsp9-HN and encoding an 
N-terminally His-tagged version of 
SARS-CoV orfla polyprotein residues 
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Figure 1 

Alignment of the sequence of the SARS-CoV Nsp9 protein with that of bovine coronavirus (BCV) 
and of the transmissible gastroenterisis virus (TGV). Conserved residues are identified with a black 
background. Homologous residues are boxed. 
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4118-4230) was checked by sequencing (MilleGen, Toulouse, 
France). Expression was performed in E. coli strain C41(DE3) 
(Avidis SA, France) transformed with the pLysS plasmid 
(Novagen). This plasmid carries the lysozyme gene, allowing 
tight regulation of the expression, and supplies the tRNAs for 
six rare codons used with a very low frequency in E. coli. 
Cultures were grown at 310 K until OD 600 reached 0.6 and 
were then stored for 2 h on ice; 2% ethanol was added for the 
induction of stress chaperones (Gong & Shuman, 2002). 
Expression was induced by adding 50 pM IPTG and cells were 
incubated for 16 h at 290 K. Cells were collected by centrifu¬ 
gation and the bacterial pellets were resuspended and frozen 
in 50 m M Tris-HCl, 150 m M NaCl, 10 m M imidazole pH 8.0. 

Cellular suspensions were thawed with 0.25 mg ml' 1 lyso¬ 
zyme, 0.1 pg ml -1 DNase and 20 m M MgS0 4 and were 
centrifuged at 12 OOOg. The supernatant was applied onto an 
Ni-affinity column connected to an FPLC system (Amersham 
Pharmacia Biotech). The protein was eluted with 50 m M Tris- 
HCl, 150 m M NaCl, 250 m M imidazole pH 8.0 and then 
applied onto a preparative Superdex 200 gel-filtration column 
pre-equilibrated in 10 m M Tris-HCl, 300 m M NaCl pH 8.0. 
The recombinant protein was characterized by N-terminal 
sequencing, mass spectroscopy, dynamic light scattering 
(DLS) and circular dichroism (CD). 


Table 1 

Crystal parameters and data-reduction statistics of the Nsp9 protein 
crystals. 


Values in parentheses are for the last resolution shell. 


Space group 

P6 V5 22 

Unit-cell parameters (A) 

a - b = 89.7, c = 136.7 

Beamline 

ID14-EH1 at ESRF (X = 0.934 A) 

Resolution (A) 

26.0-2.8 (2.94-2.8) 

^sym (%) 

5.3 (28.1) 

Ml) 

9.9 (2.5) 

No. reflections 

90899(11486) 

No. unique reflections 

8395 (1166) 

Completeness 

98.7 (98.7) 

Multiplicity 

10.8 (9.9) 


2 (Emerald BioStructures), Structure Screens 1 and 2 and 
Stura Footprint screen (Molecular Dimensions Ltd). The 
crystals were obtained in 2.0 M ammonium sulfate, 0.1 M 
phosphate-citrate pH 4.2 and with a protein concentration of 
5.8 mg ml” 1 in the gel-filtration buffer. The optimization of the 
crystallogenesis was performed with nanodrops in a two- 
dimensional matrix (Lartigue et al ., 2003) with a precipitant 
range of 1.8-2.2 M ammonium sulfate and a pH range of 
4.0-4.5 (0.1 M phosphate-citrate), leading to a crystal size of 
~100 x 100 x 80 pm (Fig. 2). 


2.3. Protein characterization 

DLS was performed with a Dynapro Microsampler (Protein 
Solutions) using a protein solution at 5.8 mg ml” 1 in 10 m M 
Tris-HCl, 300 m M NaCl pH 8.0. The CD spectrum of the final 
purified product was recorded between 185 and 260 nm on a 
JASCO J810 spectrometer using a protein solution at 
0.1 mg ml” 1 in sodium phosphate buffer pH 7.0 containing 
25 m M NaCl. 

2.4. Crystallization 

Crystallization screening was performed by vapour diffu¬ 
sion with nanodrops using a Cartesian robot as described 
previously (Sulzenbacher et al., 2002; Vincentelli et al., 2003). 
Briefly, three commercial kits were used: Wizard Screens 1 and 
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Figure 2 

Optimized crystals of the SARS-CoV Nsp9 protein. The scale bar is 
100 pm. 


2.5. Data collection 

The crystals were cryocooled in a pure solution of silicone 
oil DC200. They were exposed at beamline ID14-EH1, ESRF, 
Grenoble using a Quantum ADSC Q4R detector. A total of 
110 1° oscillations were recorded with a crystal-to-detector 
distance of 180 mm and a collection time of 9 s per frame. 
Diffraction data were integrated with DENZO (Otwinowski 
& Minor, 1997) and were reduced with SCALA (Collaborative 
Computational Project, Number 4, 1994). 

3. Results and discussion 

3.1. E. coli protein expression and purification 

We have subcloned 35 SARS-CoV targets in the Gateway 
system, including 20 full-length proteins and 15 protein 
domains. To date, 70 constructs have been generated, of which 
28 were expressed, 14 were soluble and five were purified. 
Four of them led to small crystals, among which were those of 
the Nsp9 protein described in this report. Expression of 
selenomethionine-substituted Nsp9 was performed using 
the method of methionine-biosynthesis pathway inhibition 
(Doublie, 1997). Purification of the selenomethionine protein 
was performed as described above and crystal optimization is 
under way. 

3.2. Data collection and reduction 

Nsp9 crystals diffract to 2.8 A at ID14-EH1 (ESRF, 
Grenoble). Data integration and reduction indicate that they 
belong to the P622 space group. R sym is 5.3%, an excellent 
value considering the redundancy of the data (Table 1). 
Reflections are observed at multiples of six along the c axis 
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(00/), indicating that the space group is either P6{22 or its 
enantiomorph P6 5 22. The unit-cell parameters are a = b = 89.7, 
c = 136.7 A, which lead to a V M value of 3.1 A"' Da -1 (60% 
solvent) with two molecules in the asymmetric unit (Matthews, 
1968). The observed distribution of centric or acentric inten¬ 
sities overlaps with the theoretical curve, an indication that 
merohedral twinning, a feature that is often observed in 
trigonal or hexagonal crystals, is not present. 

3.3. Characterization 

SARS-CoV Nsp9 has been purified to homogeneity in two 
steps. The identity of the final product has been confirmed by 
N-terminal sequencing. The oligomeric status of Nsp9 has 
been checked using gel filtration and DLS. The former tech¬ 
nique indicates that the protein is monomeric, while the DLS 
analysis is consistent with a monodisperse species with an 
apparent Stokes radius of 26 A and an equivalent mass of 
31 kDa, which corresponds to a dimer. This discrepancy might 
be related to the concentration differences between the two 
techniques. 

A PSI-Blast search retrieved seven homologous sequences, 
all belonging to members of the Coronaviridcie family. They 
were aligned using MULTALIGN (Corpet, 1988) with stan¬ 
dard options. The consensus of the secondary-structure 
predictions obtained with JPRED (Cuff et al., 1998), 
PSI-PRED (McGuffin et al., 2000) and PREDICT PROTEIN 
(Rost, 1996) converges to a fold of seven /1-strands. A fold- 
recognition analysis was performed with the threading 
programs 3D-PSSM (Kelley et al., 2000) and INBGU (Fischer, 
2000). Both programs fail to detect any protein homologue to 
Nsp9, but converge to a fold of two seven-stranded /1-sheets. 
In agreement, the CD spectrum of purified Nsp9 reveals a 
structured protein formed by a majority of /1-strands (35%) 
and /1-turns (18%), but which also contains 15% a-helix. 
Random-coil segments account for 32% of the total. 

4. Conclusion 

The SARC-CoV Nsp9 protein expressed in E. coli was readily 
crystallized using the nanodrop screening (Sulzenbacher et al., 
2002) and optimization (Lartigue et al, 2003) approaches. 
Crystals diffract to 2.8 A resolution and are amenable to 
structure determination using SeMet substitution and MAD 
methods (Hendrickson, 1991) at synchrotrons. 
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Goethe University, Frankfurt-am-Main, Germany) for 
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